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Abstract 



, We propose alternative discriminant measures for selecting the best 

y-^ • basis among a large collection of orthonormal bases for classification pur- 

' poses. A generalization of the Local Discriminant Basis Algorithm of 

CZ5 . Saito and Coifman is constructed. The success of these new methods is 

evaluated and compared to earlier methods in experiments. 

7— I ■ 

> ■ 1 Introduction 

in ' 

This paper is the result of my trying to improve the method applied in Fossgaard 
(1997) to discriminate between two distinct classes/types of signals by using 
expansions of the data in wavelet packet/local trigonometric bases. This method 
| was first invented and described by N. Saito and R. Coifman. For a thorough 

Q\ • exposition on this theme, I refer to Saito (1994) and Saito, Coifman (1996), a 

! brief summary of the main ideas is given below. 

O ■ Each signal belonging to a training dataset is decomposed in a time/space 

-frequency dictionary, that is a decomposition into a large collection of orthonor- 
mal bases arranged in a binary tree structure, containing either wavelet-packet 
basis functions, or local trigonometric basis functions. A measure of energy- 
■ density is then computed for each coordinate in the dictionary for each class of 

signals, originally in Saito (1994) this is taken to be the square of the coordinate 
summed over all the training signals belonging to a class of signals, and then 
normalized by the total energy projected onto this coordinate. Then a basis 
called the "Local Discriminant Basis", LDB for short, is chosen from the dictio- 
nary by maximizing a certain discrimination measure, defined by some additive 
cost-functional, over the dictionaries of energy-densities. The coordinates where 
the discrimination measure takes on its largest values are called the most im- 
portant features of the signals. These coordinates are selected from the LDB 
and used as input for some classifier. 

This method is very powerful in many cases, but it also has its weaknesses, a 
serious one is that the LDB is not able to distinguish two signals both consisting 
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exclusively of one and the same basis element, only with opposite sign. One way 
of dealing with this problem is described in Saito, Coifman (1996), where one 
estimates the probability- density functions, pdf's, of the projections onto the 
different basis elements in the dictionary, and selects the basis which maximizes 
some well-chosen functional on these pdf's. 

In this paper, I will try to improve on the LDB-method described above, 
by constructing new dicrimination measures that yield more relevant features. 
I will also try to improve the performance of the algorithm by using several 
LDB's in sequence, and by using a classifier specially designed to fully utilize 
the increased degree of freedom multiple LDB's (MLDB's) give us in selecting 
features that are most important to our problem. 



2 The original LDB method 

The problem as expressed in Saito (1994) is optimizing a linear map: d : X — > y, 
where L) ye yX^ = X C R™ is the input signal space, y = {1,2, ...,N} is the 
output class space, a set of class labels, and X^ is the subspace of class y 
signals. To optimize the map d, one considers maps of the form 

d = C0T K 0^nxn, (1) 

where the feature extractor $„ x „ £ 0(n) is an orthogonal nxn matrix which 
extracts the n most relevant coordinates from from a binary-tree dictionary 
of wavelet packet bases or local trigonometric bases, Tk is a feature selector 
which selects the K < n most important coordinates from the n most relevant 
coordinates, and c is a classifier. The problem then is to choose c, Tk and $„ x „ 
such that the rate of misclassification of the map d is minimized on the set X. 
In Saito (1994), \&„ X n is taken to be 

*nxn = arg max X(B k ), (2) 

where C = L>iT>i is the library of all dictionaries at our disposal corresponding to 
the different wavelet or local trigonometric basis functions under consideration, 
the Bfc are all bases in Dj , and A is a measure of performance of the basis B k 
in the classification problem, such a measure is called a discrimination measure. 
The search for this $„ x „ is fast by the best-basis-algorithm of Wickerhauser 
and Coifman if the measure A satisfies an additivity property, (Saito 1994). In 
Saito (1994) the discrimination measure A is defined as 

X(B k )= £ 7 (r«(w m ),..,rW(w m )), (3) 
w m es fc 

where the time-frequency energy-map r^) is defined by 



w m • X 



(y)\2 

j >_ 



x (») e *(!/), l<y<iV, J V = \XM\, (4) 
and 7 can be some form of l p - distance, Hellinger-distance or relative entropy. 
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The signals xS v ' £ X^ v > , 1 < y < N are fed into each dictionary as given by 
, the best basis picked out by the best basis algorithm, and then the best K 
coordinates are selected from this basis, ordinarily by selecting the coordinates 
where A takes on its K greatest values. The corresponding K best basis elements 
are then used to construct a classifier by doing a "Linear Discriminant Analysis" 
(LDA) or a "Classification and Regression Trees" (CART)-analysis, or some 
other statistical classification technique, on the coordinates of the signals in 
these K best basis elements. 



3 A generalized LDB method 
3.1 New Discrimination Measures 

Using the notation from the previous section, for each basis vector w m in some 
basis Bk, let Z y ^ m be random variable on the space X">> of input signals of class 
y defined by 

Z y , m : x G X {v) [-1, 1], Z y>m (x) = w m • x. (5) 

In Saito, Coifman (1996) one estimates the empirical pdf p of Z y ^ m . These 
estimates are then used to find the most discriminating basis. But getting good 
estimates of the pdf 's is hard and computationally demanding. We will take a 
different approach and work on the a priori assumption that p is the uniform 
distribution. For each fixed w m 6 Bk, we can then compute the empirical 
expectation E[Zy tm ] of the basis coordinate w m • x for class y signals as 

E[Zy t m] — ^ . p(Z y ,m\Y — y)Z y ^ m 

If 1 1 x 1 1 2 = l,Vx e X, then in this probabilistic setting, (Q) is equivalent to 
r^(w m ) = E[Zy m }. We will first consider two-class problems: y = {1,2 
and deal with n-class problems later. Choosing 7 = £ 2 — distance squared, ( 
becomes 

X(B k )= £ (E[Zl m ]-E[Zl m ]f. (7) 

m:w m tz-Bfc 

We see that with this A, the best basis given by @) is the basis maximizing the 
sum of the euclidean distances between the expected values of all the basis co- 
ordinates for the two classes. Now, we observe that the measure of performance 
(0) of the basis Bk does not consider how the data is distributed around the 
expected values. For example, if: 



I2.n1 — 



E[ZU - yJVar[ZU, E[ZlJ + y/Var[Z* t . 



E[ZlJ - JVar[ZlJ, E[Z\ m \ + y/Var^J 



where Var[Zy m ] is empirical variance of Zy m , then it may well happen that 
h,m H h.m ^ 0, even if w m e argmax Bfc A(Sfe). 
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Ideally, we want a basis B where the overlap 0(B) given by 



O(B) 



£ \h,m^h,m\ 



m:w Tn G-B 



is as small as possible. That is a basis which simultanously is discriminating 
between classes and has the opposite property inside classes. This motivates the 
following definition of a new discrimination measure A' by 



A'(S fc ) = £ 



m:w m £Bfc _ 



mi 



E[ZlJ 



(Var[ZlJ + Var[Zl m ]yn 



(8) 



Note how the performance measure in (||) defers from the measure in (Q). We 
see that the numerator in (^) measures the separability of datapoints between 
the classes 1,2, and the denominator measures the dispersion of the datapoints 
inside each of the classes 1,2. Neither of the measures A', A captures differences 
between classes in sign in the basis coordinates. To improve on this fact, we 
define the measure A" by 



X"(B k 



Z2,m) 2 ] 1/2 / 



m:w m GBfr 

(£[(£i, m (x) - Zx, m (xO) 2 ] 1/2 

i?[(Z 2 , m (x")-Z 2 , m (x'")) 2 l 1/2X 

x^x' e x'Vx'" e X^. (9) 

We see that the numerator in (^) measures the separability of signed datapoints 
between the classes 1,2, and the denominator measures the dispersion of signed 
datapoints inside these classes. 



3.2 Construction of an Oracle Classifier Using Multiple 
LDB's 

The construction is due to the following observation: Having chosen a best basis 
*rixm where ^ n xn — ar g max B fc 6X>ie£ C(^fc), and £ is some discrimination 
measure, there are subsets Sj of the set X of input signals on which ^fl ixn 
works better than other subsets. That is, the signals in disjoint sets Sj have 
significant differences in how they distribute their energy among the different 
elements in the basis i b t nxn . More precisely: Let Wk be the feature space of 
dimension K < n spanned by the K most important elements in the best basis 
i ff t nxn , sorted in decreasing order of importance, and Pw K be the orthogonal 
projection onto Wk- 

Now, consider the sets and B^ of points in fc-dimensional euclidean 
space given by: A^ = {P Wk ^j} Xj eXW c [~M] fc , B{k) = { p W k ^j} Xj eX^ 
C [—1, l] fc . It is clear by the definition of Wfe, that the two point-clouds A^ k ' 
and £?' fe ) should be concentrated in more or less disjoint regions in [-1,1]* if the 
two classes are separable by our method, that is we should observe clustering 
when plotting the points of A 1 -^ and £?' fe ) i n [— l,l] fc and labeling each point 
after its class. 

We sort out clusters by the following recursive algorithm. 
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Algorithm 3.1 The Dyadic Cluster Search Algorithm (DCS A). Given appro- 
priately chosen numbers n > K > 1, 1 > 8 > 0, 1 > 77 > 0, 1 > fi > v > 0. 

Step 0: Choose a performance measure A as in or in or some other 

favourite measure. Set [3 = \v\X\~\, ja = [77 p^ 1 ^]] , 73 = [77 1 A^ 2 ) |] , / = 
[-1,1]- 

Step 1: Select the feature spaces Wk by the formula ^) and truncate to the 
K < n most important basis elements. Compute the sets A^ K \ B^ K ' as defined 
above. Set A = 0.0, k = 1, C^ = I k , C n k J xt = I k , FoundCluster = 0. 

Step 2: Set A = A™ , B = , C = C<*>, C next = C ( n k J xt . If \A\ < 
ja and \B\ < jb, terminate the algorithm. Else, compute a — \fj,(\A\ + |-B|)~|, 

N A {C) = \Anc\, N B (C) = \Bnc\. 

Step 3: If Na+Nb > max(a, (3), compute the error rate e = mu^A^t, Nb)/(Na+ 
Nb) and proceed to the next step. Else, if C next 7^ C, set C = C nex t and jump 
to Step 2. Else, if C nex t = C, and FoundCluster = 1, jump to Step 1. Else, 
if C nex t = C, FoundCluster = 0, if k < K, set k = k+l and jump to Step 2. 
Else, if C nex t = C, FoundCluster — 0,k — K, set k — 1, A = A + 6 and jump 
to Step 2. 

Step 4: If e < A, store the location of the cube C together with the numbers 
Na,Nb and identification of the k basis elements defining the space Wk- Then, 
for each index i G {1, 2, K}, set A® = A® - (P m A^) n C, = - 
(^#1) n C and for each x jt G X® , if P Wk x jt G C, X® = A?W - x ij; i = 
1, 2. Set FoundCluster = 1, A = 0.0, k — 1, and jump to Step 2. Else, divide 
C into 2 k subcubes Ci,...,C 2 fc by splitting each of the sidelengths of C into two 
sides of equal length, and for each index i — 1,2, 2 , jump to Step 2 with 

C = Ci, Cnext — Cj+li 1 < * < 2 fe , Cnext — C,i — 2 k . 

Less precisely: This algorithm carries out a classification on the signals in 
the input signal space X = X^ U X^ by dividing the set X into disjoint 
subsets Sj and performing a classification on each of these subsets represented 
in a basis . Each Sj consists exclusively of the signals x» G X on which the 
most discriminating basis Vl*' selected by (|j) performs best. Having computed 
a best basis the set Si is selected first, the signals in Si are assigned 
class names and then <Si is deleted from the set X. Then a new best basis 

for the new X is computed by the formula (||), the set <S 2 is selected, and 
so on. The algorithm terminates when the set X has become sparse. Thus, 
we see that by adapting the parameters we can prevent the algorithm from 
trying to classify the part of the training dataset which it finds most difficult to 
classify, and so we gain a smaller overall training-error-rate. But this adjusting 
of parameters has to be done carefully, so that the algorithm does not fail to 
catch important features of the signals. The algorithm selects the subsets Sj 
using as few features as possible, starting with only the most important feature 
element (= the most discriminating basis element in the best basis). Then, 
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given some upper limit on the rate of error allowed in the clusters, if no clean 
clustering is observed in the feature space of this single feature element, the 
algorithm adds information by taking into consideration also the second best 
feature element and looks for clustering in the feature space spanned by the 
two best feature elements and so on. If no clean clustering is observed using 
all K best feature elements, the upper error limit is increased and the feature 
space of the one most important feature element is again searched for clusters, 
and so on. Using as few features as possible reduces the risk of overtraining 
of the algorithm, that is the algorithm selecting features that are too adapted 
to the specific set X of training data. On the other hand, we see that this 
algorithm is flexible in its selection of relevant features in that it constructs 
a sequence of feature extractors {vf** } where each >&*■ is specially adapted to 
some part Sj of the dataset X. The output of the algorithm is a sequence 
C = {Ci}f =1 of dyadic hypercubes Cj C [— 1, 1] K of possibly different dimensions 
fcj, 1 < ki < K, 1 < i < L, where to each cube Cj corresponds a specific feature 
space Wki as defined above, and a class name yd which equals the name of the 

majority class of the set {{Pw k Xjlx^^j PI C, = {Pw k .*-j} Xj eSi of datapoints 
in Wki that d contains. We will call C a simple two-class oracle classifier, or 
simply oracle, for the two-class problem d : X^ U X^ — > y = {1, 2}. 

3.3 On Using and Choosing Oracle Classifiers 

Given a two-class problem d : X^ U X^ — > y = {1,2}, we compute C = 
{Cj}j =1 by the DCSA. Then, given a sample x 6 T, where T is a test dataset, 
we assign x to a class by the following procedure: We check if: Pyv fc . x € Cj, 
starting with index j = 1 and continuing until we get a positive answer for some 
index j' < L. We then assign a weighted class t/Cy-vote to x by computing the 
product of 1 — Cj* , where ey is the error rate of Cy , and its statistical frequency 
(N A (C r ) + N B {Cj,))/ \X\. If Pw k x. £ C h VCj e C, we consider the class of x 
undetermined. 

Different choices of discrimination measure or different settings of the pa- 
rameters in the DCSA result in different classifiers. For a two-class prob- 
lem, we can construct several classifiers by using different performance mea- 
sures/parameters, and let the weighted majority vote of the classifiers decide 
whether a sample x e T is of class 1 or class 2. For a n-class problem, n > 2, we 
will apply the method of splitting the n-class problem into n two-class problems: 
d : X — > {i, 0}, 1 < i < n, as proposed in Saito, Coifman (1996), by splitting the 
training data set into two sets of class i and not i. One then constructs oracles 
for each two-class problem. To classify an unknown sample x £ T, we compute 
weighted class votes as explained above for the set of oracles and assign x to 
the majority vote class. 

4 Experimental Results 

In some of the calls to the DCSA in the experiments described below we allowed 
the algorithm to select a best basis only once, we call this method a LDB-method 
(Local Discriminant Basis-method) . In the cases were we allowed the algorithm 
to select multiple different best bases in sequence, we call the method a MLDB- 
mcthod (Multiple Local Discriminant Basis-method). In the cases where we 
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organized the classifiers resulting from different calls (calls with different dis- 
crimination measures) to the DSCA into a classifier by taking the majority vote 
over these classifiers, we call the method a superposition LDB or MLDB-mcthod, 
denoted SLDB or SMLDB-method, respectively. In all the three examples be- 
low we generated 10 independent realizations of both the training dataset and 
the test dataset. The results shown in Table [|, Table ||, Table |^ are the mean 
over the 10 simulations corresponding to the 10 independent realizations of the 
datasets. 



4.1 Example 1 

We consider a two class waveform classification problem as presented in Foss- 
gaard (1997). We generated sets of 100 training signals and 1000 test signals of 
length 1024 for each class by the formula 



Q n (R, 9, t) = 

C{R, t) A n (j)e lk{ ^- r i «*(»-**)), (10) 

3=1 



where we have: 



C (R, t) — is considered constant = 1 

R 

for simplicity. 
R = 10 4 . 
k = 100. 

A n (j) = -. 
n 

Vj is random variable uniformly distributed on [1, 10]. 
9j is random variable uniformly distributed on 

n n 4 

For each n-tiple of realizations {rj, 9j}™ =1 of the pair of random variables rj, 9j, 
we generate a discrete signal S n (9) by uniformly sampling the real part of 
Q n (R,9,t) 1024 times in the variable 9 with sampling density 27r/16 ■ k = 
27r/1600. We generated data sets by extracting realizations of 5 n /||5„||2 smoothly 
from a fixed sampling interval. In this problem we used n = 3, n = 4 in ( [To| ) 
to define two classes of signals and the coiflct with filtcrlcngth 18 as dictionary. 
All calls to the DCSA in this experiment were made with K = 5, 6 = 0.01, rj — 
0.05, n = 0.10, v = 0.05. The results are shown in Table § 



4.2 Example 2 

This example is identical to Example 1 except that we used n = 4, n = 5 
in (|l^) to define the two signal classes. We used the coiflet with filterlength 
18 as dictionary. All calls to the DCSA in this experiment were made with 
K = 5, 5= 0.01, T] = 0.05. n = 0.10, v = 0.05. The results are shown in Table 

I 
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Method 


Classification 


rate (%) 




Error rate (%) 






Training 


; data 


Test data 


Training data 


Test data 




Total 


a 


Total 


(7 


Total 


a 


Total 


a 


LDB1 


99.7 


0.7 


99.5 


0.9 


19.9 


3.0 


29.8 


2.8 


MLDB1 


98.5 


1.4 


98.4 


1.4 


8.6 


1.1 


23.5 


2.7 


LDB2 


97.6 


2.0 


96.3 


3.6 


16.0 


4.5 


23.5 


4.4 


MLDB2 


95.2 


1.9 


93.8 


2.3 


12.9 


3.1 


23.7 


3.4 


LDB3 


98.9 


1.7 


98.8 


2.1 


16.6 


4.6 


24.6 


4.9 


MLDB3 


98.4 


1.1 


99.5 


0.6 


13.7 


3.1 


24.4 


1.9 


SLDB 


100 


0.0 


100 


0.0 


14.7 


3.9 


22.2 


1.5 


SMLDB 


100 


0.0 


100 


0.0 


9.1 


3.3 


20.4 


2.0 



Table 1: The average classification rates and the corresponding error rates over 
10 simulations from Example 1. LDB1 is the LDB selected by the measure A'. 
MLDB1 is the MLDB selected by the measure A'. LDB2 is the LDB selected by 
the measure A". MLDB2 is the MLDB selected by the measure A". LDB3 is the 
LDB selected by the measure A. MLDB3 is the MLDB selected by the measure 
A. SLDB is the superposition of methods LDB1, LDB2, LDB3. SMLDB is the 
superposition of the methods MLDB1, MLDB2, MLDB3. a is the square root 
of the sample variance. 

4.3 Example 3 

We consider a three class waveform classification problem as presented in Saito 
(1994). We generated sets of 100 training signals and 1000 test signals of length 
32 for each class by first extracting signal samples by the formulas 

fi(i) = uhi(i) + (1 — u)Ii2(i) + e(i) for Class 1 
f 2 (i) = uh x (i) + (1 - u)h 3 (i) + e(i) for Class 2 
f3(i) = uh2{i) + (1 — u)hj,(i) + e(i) for Class 3, 

where i = 1, 32, hi(i) = max(6— \i— 7|, 0), h 2 {i) = hi(i—8), h 3 (i) = hi(i— 4), 
u is a uniform random variable on the interval (0, 1), and e(i) are the standard 
normal variates. We then normalized the signals in the energy norm by setting 

fi(i) = AOO/H/1II2, /„(*) = / 2 (i)/||/al|2, /a(i) = H^/Wfsh, i = l,...,32. We 
used the coiflet with filterlength 6 as a dictionary for this problem. All calls to 
the DCSA in this experiment were made with K — 5, 6 = 0.01, 77 = 0.05, fi = 
0.20, v = 0.05. The results are shown in Table |. 

5 Comments 

5.1 Comments to Example 1 

In this example we achieved the best result by the superposition method using 
multiple LDB's, denoted SMLDB. We see that the generalized methods MLDB1, 
MLDB2, MLDB3 are almost indistinguishable in this example, we conclude that 
our new measures A', A" hardly yield a significantly better classification than 
the original measure A, the positive effect is in any case small. Furtermore, for 
the measure A' we do get better results by the generalized method, whereas for 
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Method 


Classification rate (%) 




Error rate (%) 






Training 


; data 


Test data 


Training data 


Test data 




Total 


a 


Total 


a 


Total 


a 


Total 


a 


LDB1 


99.3 


1.4 


98.7 


2.3 


11.9 


3.5 


19.9 


5.4 


MLDB1 


98.0 


1.4 


97.3 


2.3 


6.4 


2.9 


20.5 


4.3 


LDB2 


96.9 


2.6 


96.9 


2.5 


10.5 


2.7 


17.5 


3.2 


MLDB2 


96.0 


2.2 


94.1 


3.6 


9.5 


1.9 


19.0 


2.6 


LDB3 


99.1 


1.7 


99.6 


0.9 


24.2 


5.6 


32.6 


7.9 


MLDB3 


98.5 


1.4 


99.5 


0.5 


21.6 


3.6 


35.8 


4.2 


SLDB 


100 


0.0 


100 


0.0 


15.0 


4.9 


21.8 


4.5 


SMLDB 


100 


0.0 


100 


0.0 


8.4 


3.7 


20.1 


3.2 



Table 2: The average classification rates and the corresponding error rates over 
10 simulations from Example 2. LDB1 is the LDB selected by the measure A'. 
MLDB1 is the MLDB selected by the measure A'. LDB2 is the LDB selected by 
the measure A". MLDB2 is the MLDB selected by the measure A". LDB3 is the 
LDB selected by the measure A. MLDB3 is the MLDB selected by the measure 
A. SLDB is the superposition of methods LDB1, LDB2, LDB3. SMLDB is the 
superposition of the methods MLDB1, MLDB2, MLDB3. a is the square root 
of the sample variance. 

the measures A" and A the positive effect of generalizing is more doubtful. But 
all in all, it seems we are a little better off with either measure A', A" than the 
original A. 

5.2 Comments to Example 2 

In this example we achieved the best result with the method LDB2. We see that 
both discrimination measures A', A" clearly outperform the original measure A 
in this problem. As in the previous example, the measures A' and A" yield about 
the same results with MLDB-mcthods. When not taking superpositions of sev- 
eral classifiers, the generalised MLDB-method does not yield any improvements 
in results on test data, rather it seems that this method adapts too much to 
training data in this example. Furthermore, due to the poor performance of the 
measure A in this example, we get worse results with superposition methods in 
this example than when using the best single classifier. But we could expect to 
further lower the best error rate on test data by combining classifiers from the 
measures A', A" only. 

5.3 Comments to Example 3 

In this example we achieved the best result by the method SMLDB, and we 
see that superposition methods are clearly favourable in this case. However, 
it seems to make little difference which measure we are using when not taking 
superpositions of several classifiers. We remark that both the measures A', A" 
select the standard basis as the most discriminating basis in the first steps in 
the DCSA, whereas A does not choose this basis in any step. 
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Method Classification rate (%) Error rate (%) 

Training data Test data Training data Test data 





Total 


a 


Total 


a 


Total 


a 


Total 


a 


LDB1 


100 


0.0 


100 


0.0 


23.7 


1.9 


28.2 


0.8 


MLDB1 


100 


0.0 


100 


0.0 


22.9 


2.1 


28.5 


2.7 


LDB2 


100 


0.0 


100 


0.0 


23.6 


2.4 


27.9 


1.9 


MLDB2 


100 


0.0 


100 


0.0 


20.7 


2.7 


27.7 


1.5 


LDB3 


100 


0.0 


100 


0.0 


25.0 


2.6 


29.1 


1.8 


MLDB3 


100 


0.0 


100 


0.0 


23.0 


2.6 


26.2 


2.6 


SLDB 


100 


0.0 


100 


0.0 


18.7 


1.9 


22.7 


0.9 


SMLDB 


100 


0.0 


100 


0.0 


15.7 


1.9 


20.5 


1.0 



Table 3: The average classification rates and the corresponding error rates over 
10 simulations from Example 3. LDB1 is the LDB selected by the measure A'. 
MLDB1 is the MLDB selected by the measure A'. LDB2 is the LDB selected by 
the measure A". MLDB2 is the MLDB selected by the measure A". LDB3 is the 
LDB selected by the measure A. MLDB3 is the MLDB selected by the measure 
A. SLDB is the result from a superposition of the methods LDB1, LDB2, LDB3. 
SMLDB is the result from a superposition of the methods MLDB1, MLDB2, 
MLDB3. (7 is the square root of the sample variance. 

5.4 Conclusion 

We have shown that estimating expectations and variances directly from the 
expansion coefficients of the datasets in the binary-tree structured dictionary of 
bases may lead to better results than when using the energy-density dictionaries 
of bases. Also, we have shown that comparing/combining different discrimina- 
tion measures in classification problems may lead to significant improvements 
in the success of the classification methods. 

A Applied software and hardware 

All algorithms and transforms used in the numerical experiments, except some 
of the random number generators described below, were implemented in the 
computer language C++ and compiled with the GNU project C++ compiler 
on a HP K260 machine with a PA 8000 processor. 

A.l Random number generators 

In the examples we used the Fortran NAG- routines G05DAF, G05FAF for gen- 
erating random numbers with uniform distribution, and G05FDF for generating 
random numbers with standard normal distribution. 
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